Theoretical Foundations of Entity Resolution Models

نویسندگان

  • Csaba István Sidló
  • András József Molnár
  • András A. Benczúr
  • János Demetrovics
چکیده

Data quality is crucial in all information systems. As a key step in obtaining clean data, record linkage or entity resolution (ER) groups database records by the underlying real world entities. In this paper we give practical motivating examples and review the available ER formal models. The formal model for matching and merging records determines not just the power and quality, but also the algorithmic cost of the resolution process. Starting from a naive definition that may lead to unbounded entities or infinite loops and also discussing the shortcomings of the standard axioms, we give algebraic properties that lead to efficient record partitioning. Finally we describe algorithms suitable for complex entity resolution problems that may include fuzzy clustering to split a partition of records into potentially overlapping entities.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

بررسی و ارزیابی مدل های ناتوانی در علوم توان بخشی

ackground: In three decades, researchers have tried to make clear epistemological foundation in the field of disability. These models play an important role in providing theoretical-practical support and services for people with special needs. The purpose of this study is to evaluate models of disability rehabilitation. Method: The research method was theoretical method. For this purpose, th...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

بررسی الزامات تورم یک رقمی در اقتصاد ایران

This paper examines the important factors in determining inflation and the requirements to achieve single-digit inflation in Iran. For this purpose, this study has implemented the theoretical foundations & empirical studies of inflation, the experience of transition for the countries with high inflation rates, economic conditions of Iran and the appropriate econometric model. As to be expected,...

متن کامل

Effectiveness of Teaching General Courses on Theoretical Foundations of Islam on Religious Identity and Academic Resilience of Sports Science Students at Ilam University

Background and objectives: Promoting the theoretical foundations of Islam has a role in determining the individual and social behavior of students. Therefore, this study aimed to investigate the effect of teaching general courses on the theoretical foundations of Islam on the religious identity and academic resilience of Sports Science students at Ilam University. Materials and Methods: The me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014